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AN  EMPIRICAL  BAYES  APPROACH  TO  OUTLIERS 


. 1 


Enrique  de  Alba  and  J.  Van  Ryzin 


Technical  Summary  Report  # 17  34 
March  1977 

ABSTRACT 


A formulation  of  the  problem  of  detecting  outliers  as  an  empirical 
Bayes  problem  is  studied.  In  so  doing  what  arises  is  a non-standard 
empirical  Bayes  problem  for  which  the  notion  of  average  risk  asymptotic 
optimality  (a.r.a.o.)  of  procedures  is  defined.  Some  general  theorems 
giving  sufficient  conditions  for  a.r.a.o.  procedures  are  developed.  These 
general  results  are  then  used  in  various  formulations  of  the  outlier  problem 
for  underlying  normal  distributions  to  give  a.r.a.o.  empirical  Bayes 
procedures.  Rates  of  convergence  results  are  also  given  using  the  methods 
of  Johns  and  Van  Ryzin  (1971,  1972). 
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1 . Introduction  ■ 

It  is  not  uncommon  to  find  samples  in  which  some  of  the  observations 
appear  suspiciously  far  away  from  the  main  group.  Such  observations 
are  often  called  outliers. 

The  problem  appears  in  the  literature  as  early  as  18  38,  when  Bessel 
mentioned  the  simple  rule  of  not  rejecting  any  observations.  There  is  a 
large  number  of  results  on  the  topic,  which  consider  the  causes  of  outliers 
and  present  different  solutions.  There  are  several  articles  which  include 
excellent  historical  reviews  of  the  work  done  in  this  area.  See  for  example 
de  Alba  (1974),  Rider  (19  33),  Ferguson  (I96la,  I96lb)  and  Guttman  and 
Smith  (1969). 

The  outlier  problem  actually  presents  two  aspects:  i)  identify  any 
particular  observation  (or  observations)  which  come  from  a distribution 
other  than  the  one  which  has  been  assumed  to  explain  the  main  body  of  the 
observations:  spurious  observations,  ii)  obtain  a procedure  for  the  analysis  of 
the  data  which  is  not  very  much  affected  (if  at  all)  by  the  presence  of 
spurious  observations  or  by  the  rejection  of  non- spurious  observations. 

This  paper  only  considers  the  first  aspect  of  the  problem. 


Research  of  this  author  was  sponsored  in  part  by  DHEW,  PHS,  National  Institutes 
of  Health  under  Grant  5 RO  1 CA  18332-02  and  by  the  United  States  Army  under 
Contract  No.  DAAG29-7 S-C-0024. 


2.  Empirical  Bayes  Terminology. 

We  begin  by  stating  the  basic  elements  of  decision  theory  which  we 


will  be  using: 


i)  The  parameter  space  G • 
ii)  A set  A of  possible  actions. 
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iii)  A loss  function  L(a,\)  > 0,  defined  on  Ax©.  For  any  point 
(a,  \)  e Ax©,  L(a,  \)  < 00  is  the  loss  that  results  when  0 e © 
is  the  true  value  of  the  parameter  and  we  take  the  action  a t A. 

iv)  The  sample  space  X which  is  taken  to  be  a finite  dimensional 
Euclidean  space.  For  each  \ e © there  is  defined  a c.d.f. 

rxW. 

We  shall  be  working  with  behavioral  decision  rules.  A behavioral 
decision  rule,  t(x),  is  a function  which  gives,  for  each  x in  the  sample 
space,  a probability  distribution  over  A.  We  thus  have  the  average 
loss  when  using  t(x)  given  by 

L(t(x),  X)  = EL(Z,  X) 

where  the  expectation  on  Z is  taken  with  respect  to  t(x).  Furtner,  the 
risk  function  is  defined  as 

r(t,  X)  = ExL(t(X),V) 

where  EQ  denotes  the  expectation  of  X when  the  value  of  the  para- 
meter is  \.  The  minimum  Bayes  risk  with  respect  to  prior  distribution 
G on  \,  or  Bayes  envelope  functional,  is: 

r(G)  = r*(t„,G)  = inf  r*(t,  G)  = inf  f r(t,\)dG(\)  , 
t t 

where  t is  a Bayes  rule. 

Suppose  we  are  confronted  with  the  same  decision  problem  repeatedly 
and  independently.  Let  ( Aj,  X^),  ( A , X^),  . . . , (An,  X^)  be  a sequence 
of  mutually  independent  pairs  of  random  variables,  where 
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Aj,  A^,  . . . , A^  all  have  an  unknown  common  (prior)  distribution  G 
defined  on  © and  the  conditional  distribution  of  X given  A = X 
is  specified  by  the  p.  d.f.  f (x)  = (dF  /d-r)(x),  r = 1,  . . . , n assuming 
that  the  family  {F  : X e ©}  is  dominated  by  a <r-finite  measure 
t on  X. 

The  empirical  Bayes  approach  constructs  a decision  procedure 

concerning  A^+j  (unobservable),  based  on  the  values  that  have  been 

observed  for  Xi>*-->xn+j*  i.e.,  using  x^  . . . , x . The 

(Aj,  . . . , An)  remain  unobservable  throughout.  For  a decision  about 

A a function  t (x  ,),  whose  form  depends  on  the  values 

n+1  n n+1 

Xj,  . . . , xn  is  used.  An  e.  B.  decision  procedure  is  a sequence 

T = {t  } of  such  functions, 
n 

Robbins  (1963,1964)  defined  a sequence  T = {t^}  as  being 

asymptotically  optimal  (a.o.)  relative  to  G if 

lim  r*(t  , G)  = r(G)  . 
n — * 

He  also  gives  conditions  under  which  a rule  is  a.o. 

A question  that  arises  when  dealing  with  empirical  Bayes  decision 

* 

rules  is  "how  fast,  relative  to  n,  does  r (tn»G)  converge  to  the 
minimum  risk?'.  Johns  and  Van  Ryzin  (1971,1972)  have  studied  the  problem 
and  given  rates  in  different  situations.  We  shall  also  consider  the 
rate  problem  in  relation  to  our  rules  for  testing  outliers. 
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3.  An  Extension  of  Empirical  Bayes  Procedures. 

The  results  presented  for  testing  outliers  are  based  on  a non-standard 

empirical  Bayes  framework  which  can  be  described  as  follows.  Let 

(X  , A ),  (X  , A ),  . . . , (X  , A ) be  mutually  independent  pairs  of  random 
lied  n n 

variables,  X^(r  = 1,  . . . , n)  is  defined  on  a sample  space  and 

A^(r  = 1,  . . . , n)  on  a parameter  space  ©.  The  A , r = 1,  . . . , n,  are 
assumed  to  have  a common  prior  distribution  G on  © and  the  conditional 
density  of  Xf  given  that  Af  = \ is  f^(xr)>  r = 1,  . . . , n,  w.r.t.  a 
cr-finite  measure  t. 

We  now  define  the  empirical  Bayes  rule  for  the  r^  problem, 

r-  1,  . . . , n,  denoted  t^(X  ).  Let  t (x)  = t (X,,...,X  ;x).  Define 

n r n n 1’  ’ n 

Xr(x)  = (X1,...,Xrl,x,Xr+1 Xn) 


n = • • • » xr_i»  x*  Xr+1»  • • -»xn;x)  = tn(Xr(x);x) 

so  that  the  form  of  the  rt^1  decision  rule  depends  on  Xr(x)-  Note 

t^r\x  ) = t (X  ) since  t^r\x)  differs  from  t (x)  in  that  the  first  has 
n r n r n n 

a fixed  value  of  x in  the  place  of  the  r1^  random  variable. 

This  particular  use  of  the  e.  B.  method  is  non-standard.  We  have 
two  differences: 

i)  We  are  not  actually  working  with  a sequential  decision  procedure, 


ii)  We  do  not  use  a decision  rule  whose  form  depends  only  on  the 


first  (n-1)  observations. 


I 
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If  tj^x)  is  obtained  for  each  r = 1,  . . . , n and  we  denote  its 
* (r) 

risk  relative  to  G by  r (t^  , G),  we  obtain  an  n-vector, 

= {t^(xr);  r = 1 n}  (or  t ^ = {^(x^)})  of  decision  functions  which 

we  shall  call  an  e.  B.  decision  procedure.  In  our  application  such  a procedure 
will  give  us  a rule  to  determine  whether  each  X , r = 1,  . . . , n,  is  spurious 
or  not.  We  make  the  following  definition. 

DEFINITION  1.  Let  (X^,  A. ),...,  (X  , A ) be  mutually  independent  pairs 

i i n n 

of  random  variables  and  t^(x)  = t (X  (x);x),  r = 1,  . . . , n.  If 

n n ~r  ’ ’ ’ 

G)  = (1/n)  Yj  r*(t^,  G)  - r(G)  as  n - « (3.1) 

r = l 

then  the  e.B.  procedure  t = {t^r\x  )}  is  said  to  be  "average  risk 

~ n nr 

asymptotically  optimal"  (a.  r.  a.  o. ) relative  to  G. 

The  symbol  -P-*  will  be  used  to  denote  convergence  in  probability.  We  now 
state  and  prove  the  following  lemma  in  the  case  where  © = { ©0 , 9^}  and  A=  {aQ,  a^-f 
A Bayes  rule  against  G in  this  case  can  be  written  as  (see 
Robbins  (1964)), 


where  t(x)  = Pr{taking  action  a,  lx  = x}  and 

AQ(x)  = / (L(a0»X)  - L(a^,  X.) } f^(x)dG(x)  . 

LEMMA  1.  Let  A (x)  = A (X;x)  be  such  that  as  n-«,  A (x)  - P 
n n ~ n 

a.e.  (t).  Define 


(3.2) 
(3.  3) 

agw 
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Then,  for  any  0 < d < 1,  0 < < 1,  and  each  fixed  r, 

r^t^.G)  - r(G)  < 2 1 f UG(x)|  1 EU^x)  - A^x)  | Ldr(x)  + 
d 1-d  d 

2 J|ag(x)|  e|  Ar(x)  - Aq(x)  | dr(x),  r = 1,  . . . , n . 


E denotes  the  expectation  under  X^.  . . . ,X^. 

PROOF.  From  the  definition  of  t^r\x,  ind  r (t^,G)  we  have 

n n 

r*(t^r),  G)  - r(G)  = E f A (>•)[  t<r)(x)  - t (x)]dr(x)  (3.5) 

n u n u 

with  t„(x)  and  A (x)  as  in  ( 3.  2)  and  ( 3.  3).  But  from  ( 3.  2)  and  ( 3.  4), 

Ca  Vj 

we  see  that 

|t^r)(x)  - t (x)|  < 
n G 

and  hence 

E|Jr,(x)  - t_(x)  | < Pr{  | A^r)(x)  - A (x)  | > (a„(x)  | } . (3.6) 

n Kj  n g G 

This  result  and  using  Fubini's  theorem  in  (3.  5)  gives 

r (t^,  G)  - r(G)  < J I A (x)  [Pr{  I A^(x)  - A (x)  | > I A (x)  |/2}dT(x) 
n j g n n G 

+ f |ag(x)  |Pr{  |An(x)  - Aq(x)  | > |ag(x)  |/2}dr(x)  . 


'l  if  lA^x)  - Ag(x)|  > | Ag(x)  | 
0 otherwise. 
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Markov's  inequality  applied  to  the  first  term  with  0 < < 1 and  to  the 

second  with  0 < d^  < 1 yields  the  required  result. 

Q.  E.  D. 

The  following  result  is  an  extension  of  Corollary  1.2  of  Robbins  (1964), 
to  a.r.a.o.  decision  rules. 

THEOREM  1.  Let  (X^,  A. ),...,  (X^,  Ar)  be  mutually  independent  pairs 
of  random  variables.  Let  A = {a^a^},  let  G be  such  that 


f L(a.,\)dG(\)  < <*>,  i = 0,1  . 
J i 


Let 


where 


Assume  that 


M), 


0 if  A^(x)  > 0 
n — 

1 if  A^(x)  < 0 , 


A;  (x)  = A (X  (x);x)  . 
n n ^ 


An(x)  = 0(4»1(X),...,4.m(X);x),  m >1 


(3.  7) 


where  the  following  conditions  are  true: 

a)  Q(y,,  . . . , y ;x)  is  continuous  in  every  y.,  j = 1,  ...,m,  a.e.  (t). 

1 m J 

b)  <t>j(Xj(x))  - <tr(X)  = X^,  * * * * • • • > ^n)  "P-*  0 as 

n — oo,  j = 1,  . . . , m,  a.e.  (t). 

c)  4,.(X„  ...,X  ) = <|>.(X  ,...,X  ),  j = l,...,m,  where  (v.,...,v  ) 

) r n TJ  Vj  t n 

is  any  permutation  of  the  subscripts  (l,...,n),  i.e.  the  ^'s 
are  symmetric  in  (X^,  . . . , X^). 

d)  A (x)  -P-  A Jx)  as  n-°°,  a.e.  (t). 

n G 
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Then 


t = {t^r  (X  )}  is  a.  r.  a.  o. 

~ n n r 

PROOF.  First  note  that  for  r = 1 

a^^x)  = qu^x)),  • . • , *m(^(x));x)  = 

{0(<b1(X1(x)),  • *m(X1(x));x)  - QU^X),  • • 4>m(X);x)}  + Afi(x)  . 

Now  conditions  a)  and  b)  imply  that  the  term  in  brackets  converges  to 

zero  in  probability.  Thus,  with  condition  d),  as  n - <*,  a.e.  (t) 

A^(x)  -P-*  A_(x),  x fixed, 
n G 

From  Equation  (3.6),  we  have 

r*(Jr),G)  - r(G)  < / | Aq(x)  |Pr{  [aJ^x)  - AG(x)  I > | AQ(x)  |}  dx(x)  , 
so  that 

sup  (r*(t(r),G)  - r(G)}  < / I A (x)  ( sup  { Pr  ( |A^r,(x)  - AG(x)  | > I AQ(x)  | } }dr(x). 
1 < r < n n l<r<n 

From  the  symmetry  of  the  <t>.'s  (condition  (c))  we  get 

Pr{|A(nr)(x)  - Ag(x)|  > | Aq(x)  [ } = Pr{  |a^(x)  - AQ(x)  ( > |aq(x)|}  , 

for  r = 1,  . . . , n.  Hence 

sup  {r*(t(nr),G)  - r(G) } < / [AG(x)|Pr{  (aJ^x)  - AQ(x)  | > I Aq(x)  | }dr(x)  . 
l<r<n 

Now 

| A (x)  |pr{  lA(1)(x)  - a (x)  | > I A (x)  | } < |a  (x)  I , 

G n G ^ ^ 

and  (3.7)  implies  A (x)  is  integrable,  since  as  n - «\  A^(x)  -P-*  Aq(x)» 
implies 
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i ^G(x)  I Pr{  | A^(x)  - Ag(x)  | > | ^q(x)  | } - 0 as  n - oc(  a.  e.  (t), 


we  can  apply  the  Dominated  Convergence  Theorem  to  get 

* (r) 

sup  {r  (t  , G)  - r(G) } - 0,  as  n - * . 
1 < r<  n n 


Since 


(1/n)  Yj  (r  (t;  ,G)  - r(G) } < sup  {r  (t(  ,G)  - r(G)}  , 
r=l  n 1 < r < n n 


we  have 


r (ln*  ” r(G)  0,  as  n — so  that  _t  is  a.r.a.o. 


Q.  E.  D. 


It  is  convenient  to  note  at  this  point  that  we  only  have  to  verify 

the  conditions  of  Theorem  1 in  order  to  prove  a.r.a.o.  The  following 

lemma  will  prove  useful  for  deriving  results  on  rates  of  convergence. 

LEMMA  L.  If  A (x)  = A (X;x)  is  symmetric  in  (X, ) and 
n n I n 

* (1)  v -c 

r ^n  ~ r(G)  = 0(n  f°r  some  s > 0 , 

then 

r*(Ln<  G)  - r(G)  = 0(n"4 S)  . 

The  proof  follows  from  (3.6)  and  the  symmetry  of  A (x). 


4 . An  e.  B.  Test  for  Outliers. 

Consider  n independent  random  variables  X , ....  X , where 

1 n 

given  Ar  = X , Xf  is  normally  distributed  with  mean  p and  variance 
<r\  , r = 1,  . . . , n,  i.  e. 

Xf~  N(p,<A),  r = 1,  . . . , n . (4.1) 
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u and  o are  assumed  known  and  A has  a prior  distribution 

r 

G = {p,  1 - p}  on  © = {l,XQ},  defined  as  follows 
Pr{Af  = 1)  = G(1  *)  - G(l)  = p , 

Pr{Af-  \Q}=  G(X*)  - G(Xq)  = 1 - p,  0<P<1 
for  some  \ > 1,  known,  and  r = 1,  . . . , n.  We  can  use  empirical  Bayes 

methods  to  test  the  null  hypothesis,  for  each  r,  r = 1,  . . . , n, 


Hq  : Xr  -v  N(p,  a2),  vs. 

Hj  :Xr  ~ N(p,\0a2)  . 

The  procedure  is  given  in  the  following  theorem. 

THEOREM  2.  Let  X , r = 1,  . . . , n be  defined  as  above,  A = {a^,  a^ } 

where  the  action  a is  defined  as  a.  = "decide  in  favor  of  H ", 
i l i 

i = 0,1  and  the  loss  function,  be  such  that 

L(a0,l)  = L^.Xq)  = 0 . 

Furthermore,  define  A (x)  as 

An(x)  = (1  - P)L(a0AQ)fx  (x)  " pL(ar  Df^x) 
where  f (x)  is  the  p.d.f.  of  x obtained  when  A = X,  X = 1 or  X_. 

If  p = p(X)  is  a consistent  estimator  for  p,  symmetric  in  X,  anda.e.  (t) 

P(Xj(x))  - p(X)  -P-*  0,  as  n - , 

then  the  e.B.  decision  rule 


‘„<V  = 


if  A (X  ) > 0 
n r - 

otherwise, 


(4.2) 


r = 1,  . . 


isa.r.a.o.  fortesting  vs.  H 


T 
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Under  has  the  p.d.f. 

fj(xr)  = (l/cr\r2w)exp{-(xr  - p)2/2cr2}  (4.3) 


and  under  H^, 


f\  *V  = (1A^irXQ)exp{-(xr  - p)2/2\Qa2 },  r = 1,  . . . , n . (4.4) 


The  proof  of  the  theorem  is  a direct  consequence  of  Theorem  1 for  m = 1. 

One  particular  case  of  a consistent  (symmetric)  estimator  p is  given  by 

n 

P = U/n)  X £0(Xr)  , (4.  5) 

r=l 

where 


« x -(x  - 1*) 

W - 2,  „ • 

ff  (xo  ■ l) 

A question  that  arises  when  estimating  G is  that  of  identifiability. 

Maritz  (1970)  gives  results  on  identifiability  for  location  parameters 

under  normality.  A result  for  scale  parameters  can  be  found  in  de  Alba  (1974). 

As  pointed  out  earlier,  the  usefulness  of  e.  B.  procedures  in 
statistical  applications  depends  on  how  fast  the  Bayes  risk  of  each 
successive  decision  problem  approaches  the  minimum  Bayes  risk.  In 
relation  to  Theorem  2 we  have  the  following  result. 

THEOREM  3.  Assume  the  conditions  of  Theorem  2 hold.  Let 


p = (1/n)  X MV’  where  MV  = Xo"*Xr  " ^ (xo“^»  then 

r = 1 


The  proof  is  straightforward  and  follows  from  applying  Lemma  1,  with 
dj  = 1 and  d^  = 1,  and  from  Lemma  2,  (see  de  Alba  (1974)  for  details). 

In  this  section  we  have  introduced  our  approach  to  the  outlier 
problem,  applying  it  to  a particular  case.  In  the  following  sections  we 
will  give  some  extensions. 


5.  Small  Versus  Large  Outliers. 

Suppose  we  are  interested  in  a test  for  "small  outliers"  against 


"large  outliers",  i.e. 


H0:\<\0  for  some  XQ  > 1,  vs. 

Hi=  x J *0 

where  x > 1.  The  "largeness"  criterion  is  determined  by  the  value  of 
A reasonable  loss  function  for  this  test  is  given  by 


(5.1) 


X 


O' 


L(aQ,X) 


v<x0 


X > X, 


(5.2) 


L(a^,  X) 


JA  ~ 1/X  g) 


Define  now 


X > X 
X < X 


0 

0 ' 


b(x)  = L(aQ,X)  - L(aj,X)  , 


(5.  3) 


SO  thdt  —x 

Ag(x)  = f (Ua^xJ-Ua^XjJf^xJdQxJMlA^f^xJ  + l/Ax-^Jfyx)  , 

with  f_(x)  = f f (x)dG(X ) and  f'  (x)  is  obtained  from  the  definition  of 
G J X ta 

f (x),  by  differentiating  under  the  integral  sign.  The  Bayes  rule  is: 

G 
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1 otherwise  . 


tGw  = 


The  e.  B.  rule  may  be  derived  by  getting  consistent  estimators  for  f (x) 
and  f'  (x).  We  have  the  following  theorem. 

(j* 

THEOREM  4.  Let  X, , ..  .,X  be  independent  random  variables  defined 

1 n 

as  in  (4.1),  the  parameter  space  6=  [l,00)  and  let  G beanyc.d.f. 

defined  on  G-  The  action  space  is  A = {aQ,a^},  HQ  and  are 

given  by  ( 5. 1)  and  the  loss  function  by  ( 5.  2)-( 5.  3) . Let  f (x)  = f (X;x) 

n n ~ 

and  f'  (x)  - f'  (X;x)  be  any  consistent  estimators  of  f.-dx)  and  f'_(x) 
n n ~ o ij 

respectively,  for  all  x,  symmetric  in  X,  and  such  that 

f (X,(x);x)  - f (X;x)  -P- 0,  for  all  x (5.5) 

n ~ 1 n 

fJ^(X  (x);x)  - f (X;x)  -P-  0,  for  all  x (5.6) 

as  n - oo,  and 

^n(x)  = (lAQ)fn(x)  + {cr2/(x  - p)}H  (x)  . 

Then 


I 0 If  A (X  ) > 0 

w-j.  th  . 

(^1  otherwise, 

r=  1,  . . . , n,  is  a.r.a.o.  relative  to  G. 

The  proof  follows  from  Theorem  1 with  m = 2. 

A particular  choice  of  fn(x)  and  f^(x)  is  given  by  Johns  and 
Van  Ryzin  (1972).  With  this  particular  choice,  conditions  (5.5)  and  (5.6)  are 
satisfied  (see  de  Alba  (1974).)  Theorem  3 of  Johns  and  Van  Ryzin  (1972)  will 


be  very  useful  for  proving  our  rate  theorems. 


We  now  define  kj  = k^(u^)[  = k^u^)]  as  c*ass  rea^“ 

valued  measurable  functions  on  the  real  line  satisfying  the  conditions 

of  the  definition  of  f (x)[f'(x)l  given  in  Johns  and  Van  Ryzin  (1972). 

n n 

2 

Also  note  that  since  we  are  assuming  p and  a are  known,  in 

2 2 

(4.1)  we  can  define  y = (x  - p)  /2a  and  restate  our  problem  for  the 
density 


f ( 1/n/ wy X. )exp { - y/\ } , y > 0,  X > 1 
elsewhere  . 


5.8) 


We  can  now  prove  the  following  rate  theorem. 


THEOREM  5.  Let  Y = (X  - p)2 /2a2 , i = 1 


, . . . , n,  where  the  X are 

l 


independent  random  variables  defined  as  in  (4.1).  Let  f (y)  and  f * ( y) 

n n 

be  estimates  of  f(y)  = f f (y)dG(X)  and  its  derivative  given  as  in  [ 7 ] 

with  t (y)  defined  by  ( 5.  7)  for  A (y)  = (X  * + (2y)  *)f  (y)  +f'(y). 
n n 0 n n 

For  any  t > 2,  if  we  choose  h = 0(n  and  k.  e «.  in  defining 

~ n ii 

f (y)  and  f'(y)  such  that 
n n 


f uj4i  *K.(u)du  =0  for  j = 1,  . . . , 1-1,  i = 1,  2 
J i 

and  if  for  some  d,  0 < d < 1/(21  + 3)  , 

E(A(l+t)d/(2-d))  < ^ for  some  t > o , 


(5.9) 


(5.10) 


then 


PROOF.  If  in  (5.8)  we  let  0 = 1/X,  then 
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f (y)  = 
0 


( *Jo/  \ tt  y ) exp  { - y 0 
0 


Furthermore,  if 


for  y > 0,  0 < 0 < 1 
otherwise  . 


(3(0)  = n/0  and  h(y)  = l/Viry 


(5.11) 


then  f0(y)  falls  into  the  theory  for  Case  I of  Johns  and  Van  Ryzin  (1972).  Now 
the  loss  function  (5.2)-(5.3)  may  be  written  in  terms  of  0 as 


L(aQ,  0)  = ' 


0o-0 


o >e0 


o < 0, 


and 


L(ar  0)  = 


0 < 0 


0 


e - e0  9 > 60 


5}C  # 

The  hypothesis  to  be  tested  will  be  H^:  0 > 0Q  vs.  : 0 < 0Q. 

From  Lemma  1 with  d = d = d we  see  that 

1 £ 


r*(t^,  G)  - r(G)  < 2°  J |AQ(y)  I1  aE|A^\y)  - A^y)  |ady  + 


2°  J lAQ(y)  I1  dE|An(y)  - AG(y)  Tdy  , 


where  AQ(y)  = (Xq1  + (2y)  *)f(y)  + f'(y)- 


(5.12) 


Consider  the  second  term  on  the  right  hand  side  of  (5.12).  We  now  verify 
the  conditions  of  Theorem  3 in  [ 7 ] to  obtain  the  rate  of  convergence  for  this  term. 
Note  that  from  (5.11),  we  have 

h(i)(y)  = (l/Vn)(-l/2)i(l  • (1+2)  * (1+4)...  • (l+2!))y"(2i+1)/2  . (5.13) 

Hence  h^\y)  exists  and  is  continuous  for  any  i > 1.  From  this  and 
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( t ) 

Lemma  2 in  Johns  and  Van  Ryzin  (1972)  we  know  that  f '(y)  exists 
and  is  continuous  for  any  ( > 1 and  y > 0.  Furthermore,  since  9 
is  defined  on  (0,1),  E0  < °°  is  always  true  as  required  by  their  Theorem  3. 
We  now  verify  (3.3)  of  their  Theorem  3.  Note  that 


(5.14) 


where  the  a^'s  are  non-negative  constants.  Throughout,  when  writing 

the  expression  for  f^(y),  the  summation  will  be  assumed  to  be  from 

j = 1 to  j = t unless  stated  otherwise.  f (y)  is  a decreasing  function 

0 

of  y,  for  y > 0.  Hence 

f*(y)  = sup  f (y  + t)  = f (y)  (5.15) 

e 0 <t <e  G G 

for  ail  y > 0 and  e > 0.  From  this  we  get,  using  0 < 9 < 1, 

1 \y)  = sup  lfp\y  + t)|  = sup  / I f^(y  + t)  IdG(e) 

E 0 < t < e 0<t<e 


= / |fg<)(y)  IdG(e)  = |(-1)|<  / f0(y)  £a.(0  + l/2y)jyj _ldG(0)  = |f^}(y)l 
1 / f0(y)  Zaj(1  + */2y)V  fdG(0)  = fQ(y)  £a.(l  + l/2y)V  1 , (5.16) 

where  t[^\y)  is  obtained  by  repeated  differentiation  under  the  integral 
G 

sign.  Also 

UQ(y)|  = ,0OfG(y)  " / 0f0(y)dG(0)l  <(%  + 1)fG(y)  ’ (5'17) 

By  (5.16)  and  (5.17) 

/ ^G(y)  I1_d{q^)(y)>ddy  < (I  +00)1-dE(£  a*(2Y  + l)jY_f)d  (5.18) 
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5?  j 

where  a = a /2  and  the  expectation  is  taken  with  respect  to  f„( y) . 

3 j G 

Repeated  application  of  the  ^-inequality  yields 

(1  + e0)1_dE{V  aj'(2Y  + 1)VV  < (1  + e0)1_d  X b.E|(2Y  + l)jY~f  |d  (5.19) 

* j * l -l 

where  b.  = a.  c^,  j - 1,  . . . , t - 1 and  = af  c^  . Now  by  the 

Holder  inequality 

E | ( 2 Y + l)jY_<  |d  < {E(2Y  + l)j}d  fEY“#d/(1_d)}1-d  . 

The  first  factor  on  the  right  is  always  finite  so  we  need  only  verify  that 
the  second  factor  is  also  finite.  By  Fubini's  theorem 

E{Y-fd/(l-d)}  = j {efd/(1-d)/r(i/2)){J  (e/y)1/2_fd/(1'd)exp{-0y}dy}dG(0)  . 

0 0 

Hence,  provided 

1/2  - id/(l-d)  > 0 , (5.20) 

the  expectation  will  be  finite.  But  this  requires  d < l/(2f  + 1)  and  we 
have  as  a condition  in  the  theorem  that  d < l/( 2 i + 3).  Hence  condition 
(3.  3)  of  Theorem  3 in  Johns  and  Van  Ryztn  (1972)  holds. 

We  next  verify  (3.4)  of  their  Theorem  3.  From  (5.16),  (5.17)  and 
v(y)  = h'(y)/h(y)  = -l/2y  we  get 

/ UG(y)|1_d{  |v(y)|q(e°(y)}ddy  < (1  + a*(l  + 2Y)VS}d  , 

where  s = t + 1.  Hence  we  can  apply  the  same  argument  from  (5.19)  to  (5.20) 
with  s instead  of  i and  the  condition  will  be  true  provided 
1/2  - sd/(l  - d)  = 1/2  - (f  + l)d/(l  - d)  > 0 
which  requires  d < 1/(21  + 3).  Hence,  condition  (3.4)  in  Johns  and 
Van  Ryzin  (1972)  is  satisfied. 


-18- 


To  verify  condition  (3.2)  of  their  theorem  we  use  v(y)  = -l/2y,  (5.15) 


and  (5.17)  to  obtain 

f I A (y)  I1  d|v(y)  Id{f*(y)}d//2dy  < 
g e 

1 00 

2'd(i  + eQ)l~d{f  y"d{fG(y)}1_d/2dy  + f y"d  {fG(y)}1_d/2dy}  = 

2“d(l  + 0o)1_d[  Aj  + An]  , 

where  and  are  defined  in  an  obvious  manner.  Using  the  fact 

that  y > 1 and  applying  the  Holder  inequality  we  get,  for  t > 0, 

00  00  00 
AII  - / d 2dy  - (f  y ^1+t^dy}d^2{ / y 1fG(y)dy)1  d,/2  ( 

with 

T11  = (1  + t)d/(2  - d)  . 

Clearly  the  first  factor  on  the  right  is  finite.  Finiteness  of  the  second 
factor  follows  from  the  definition  of  f (y),  Fubini's  Theorem,  the  use  of 
Laplace  transforms  and  application  of  condition  (5.10). 

Aj  can  be  shown  to  be  finite  by  noting  that 

/ “d(  r,.\1-d/2  ^ -1/2  / (2-d)/4 

j y fr(y)  J y dy  . 

0 0 

Hence  condition  (3.2)  in  Johns  and  Van  Ryzin  (1972)  holds. 

Condition  (3.1)  of  Johns  and  Van  Ryzin  (197  2)  can  be  shown  to  hold 

by  arguments  which  are  essentially  the  same  as  those  used  so  far. 

This  completes  the  proof  that  under  certain  conditions  and  if  the 

consistent  estimators  of  f (y)  and  fL(y)  are  used  as  given  in 

G G 


5.  21) 
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Johns  and  Van  Ryzin  (197  2),  then  the  rate  of  convergence  of  the  second 

term  in  (5.12)  is  0(n  ^ ^ ^ V We  shall  now  give  the  rate  of 

convergence  of  the  first  term  in  (5.12). 

From  the  definition  of  A(r)(y),  f (y)  and  f'(y)  we  have 

n n n 

An>(y)  = (eo  + + ^'(y) 

where 

fn  }(y)  = ^ (i/2){K1{(Yj  - y)/hn}  + ^{(y  - Y.)/!^}}  , 

(1)'  \n 

fn  (y)  = (1/nhn}  L2  {(1/2hn)K2{(YJ  - y)/2hn)  - ^{(y  - Y.)/hn}}  , 
since  K.(0)  = 0,  i = 1,  2. 

Using  the  c_-inequality,  K*  = sup|K.(u)|  < <*,  i = 1,  2 and  (5.17) 

u 

we  get 

/ lAG(y)|1~dEUji1)(y)  - An(y)|ddy  < 

2(Kj/2nhn)d(l  + 0o)1_d  / (eQ  + l/2y)d  (fG(y)  }^_ddy 

+ <K2/nhn)d(2"d  + W1  + e0)1_d  / {fG(y)}1_ddy  • (5.21) 

Arguments  similar  to  those  used  above  can  be 

used  to  prove  that  both  of  the  integrals  that  appear  in  (5.  21)  are 

finite.  This,  together  with  the  particular  choice  of  h = 0(n  ^2*+^). 

n ’ 

yields 

/ |A  (y)  |1_dE | A(1)(y)  - A (y)  |ddy  < 0(n'd(2<_1)/(2/+1))  . 
uj  n n 

The  proof  of  the  theorem  is  completed  by  using  Lemma  2. 

Q.  E.  D. 
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This  theorem  completes  the  results  on  testing  "small  outliers" 
against  "large  outliers".  In  the  next  section  we  shall  consider  another 
variation  of  tests  for  outliers  under  the  e.B.  approach. 


6.  Unknown  Mean  and  Variance. 

All  the  results  derived  up  to  now  have  assumed  p and  a2  are 
both  known.  In  this  section  we  shall  relax  this  assumption.  The  first 
result  we  present  corresponds  to  the  situai’^n  given  in  Theorem  2 without 
the  assumption  that  p and  cr2  are  known. 

THEOREM  6.  Assume  all  the  conditions  of  Theorem  2 except  that  p and 
a 2 are  unknown  and 

p = minfl,  p },  P = ((m2  ~ X - \<r  )/(l  - \)<r  } 
where  A*  - max{0,A}, 

n 

m = (1/n)  Yj  XS;  X = m^  , 
r=l 


and 

a2  = {3(m^-X2)(l  r\)*\/{(m2-X2)2(l+\)29  - 12\(m4  - 6X2m^  + 5X4) } + 


Also  let 

f jlx)  = (l/'\^2iT~2)exp{-(x  - X)2/2o  2 } , 

^ (x)  = (l/\jZva  2\)exp{-(x  - X)2/2ct2\  } , 

and 

An(x)  = (1  - p)L0(\)?x(x)  - pLjUJfjtx)  . 


(6.1) 

(6.2) 

(6.3) 
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Then  the  e.B.  decision  rule 


*n(Xr>  = 


0  if  A (X  ) > 0 
n r — 


1  otherwise, 

r = 1,  ...,n,  is  a.  r.  a. o.  for  testing  HQ,  relative  to  G={p,  q}. 

As  in  Theorem  2,  (4.  3)  and  (4.4)  are  true  and  by  the  same  argument 
given  there  we  have  that  (3.  4)  is  satisfied.  The  remaining  conditions 
of  Theorem  1 are  easily  verified  by  using  Slutzky's  theorem  and  the 
fact  that  the  sample  moments  are  consistent  and  such  that,  as  n — <», 
-P—  3cr4[p  + (1  - p)\2]  + 6p2ff2[  P + (1  ~ P)k  ] + p , 

m^  -P—  <r2{p  + (1  - p)\}  + p , 

rrij  = X -P-*  p. 


(6.4) 


If  only  a is  unknown  similar  arguments  can  be  used.  We  must 
use  p instead  of  X and  we  do  not  need  (6.4). 

Notice  that  (6.1)  is  written  with  both  signs  (+  and  -).  The  question  of  which 
root  to  use  can  be  answered  as  follows.  In  the  expression  under  the  square 
root  sign,  as  n — <», 

9(m2  - X2)2(l  + \)2  - 12\(m4  - 6X2m<,  + 5X4)  -P-* 

9<t4{2\  - p-  (1  - p)\  - \p  - (I  - p)X2}2  = 9o4{p  + (1  - p)\  + p\  + (1  - p)\2  - 2\  }2  . 

When  we  take  the  positive  or  negative  signed  term,  the  two  possibilities 
~ 2 

with  respect  to  o that  we  have  as  n -»  00  are 


o2  -P-*  a2  or  o2  -P—  02(\  - \p  + p/\)  . 
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But 


a2(\  - \p  + p/\)  < a2  if  P > \/(l  + X) 

> a2  if  p < \/(l  f \)  . 

Now,  in  practical  applications  (1  - p)  would  usually  be  small  and 
(1  - p)  = 0.1  is  already  considered  too  extreme.  Box  and  Tiao  (1968), 
take  the  extreme  value  \ = 100.  On  the  other  hand  it  seems  reasonable 
to  think  that  the  larger  the  discrepancy  between  the  variances  (i.e. 
the  larger  \)  the  smaller  we  would  expect  (1  - p)  to  be.  So  that  in 
general  we  can  expect  p to  be  such  that  p < \/(l  + \ ) ■ This  can  be 
taken  as  an  indication  that  the  estimate  of  a is  the  largest  of  the 
two  values  obtained.  This  in  turn  means  we  should  take  the  positive 
root  in  (6.1). 

, 2 

We  present  the  following  rate  theorem  for  the  situation  where  a 
is  known  and  p is  unknown. 

THEOREM  7.  Assume  the  conditions  of  Theorem  2 except  that  p is 
unknown  and  p is  defined  as 

p = min{l,  p*},  P*  = {(m2  - X2  - \j2)/(1  - \)o2}  • 

* 2 
f *(x),  X = 1,  X is  defined  as  in  (6.  2)  and  (6.  3)  but  with  o instead 

of  <r  2 and  An(x)  = (1  - p)LQ(\)7x(x)  - pL^Df^x).  Then, 

r^t^.G)  - r(G)  = 0(n"1/2)  . 

PROOF.  From  Lemma  1,  with  d = d,  3/4  < d < 1,  and  d^  = 1,  we  have 
r*(t|il),G)  - r(G)  < 2 f | AQ(x)  |1_dE  | A^(x)  - ^n(x)  [ddx  + 

2 f EU  (x)  * A (x)  |dx  . (6- 

' n o 
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The  definition  of  A (x)  and  A fx)  alono  with  repeated  use  of 
n G 

the  c -inequality  yields 

/ E i A^(x)  - AG(x)  |dx  < | L(aQ,  \)(1  - p)  I f E|f^(x)  - f^(x)  |dx  + 

|L(a1,  l)p  I f Elf^x)  - f1(x)|dx  + | L(aQ>  X)  J / E IT  (x)(p  - p)  |dx  + 

/ L(  a j , 1 ) I f E|f1(x)(p  - p)  Idx  . 

Now  let 

An  = / Elf^(x)  - fv(x)|dx,  Bn  = / Elf^x)  - fj(x)|dx  , 

C = J E |f  ( x)( p - p)  |dx  and  D = f E |7  (x)(p  - p)  |dx  . 

n J \ n J i 

If  in  we  take  the  Taylor  Series  expansion  of  f^(x)  about 

P,  we  get 

fj(x)  = f^x)  + (p  - p)fj(x,  p')  (6.6) 

where 

f'(x,p')  = { df  (x)/dp}_  * = { f,  (x) } _ * • (x  - p’,:)A2  (6.7) 

I 1 M"  M-  ^ M”  M* 

if. 

and  p = p + Up  ~ p),  0 < t,  < 1.  Hence 

f E |f  (x)  - f^(x)  |dx  = E(  |p  - p |/tr2)  J (x  - p If^x,  u )dx  , (6.  8) 

with 

3je(^>  M"  ) = *(^0  ) ^ = ^ * 

\ \ M-  ~ M- 

Now  assume  < |i.  ^ and  are  tixed  so  that  as  x changes, 

jJj  »v 

p will  shift  but  always  staying  within  the  interval  | p,  p],  hence  for 
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a)  x < p lx  - p|  < |x  - p | and  I x - |jl  J > 1 x - p.  ! (6.9) 

■ y# 

b)  x >;  I x - (j.  | > | x - jjl  I and  |x  - p | < |x  - p I (6.10) 

c)  M-  £ x — P 0<|x-^.|<|fi-fx|.  (6. 11) 

The  integral  in  (6.8)  can  be  broken  into  three  integrals,  each  one 

taken  over  one  of  the  intervals  indicated  in  a),  b)  and  c).  The  inequalities 
given  in  (6.9)-(6.11)  can  be  used  (de  Alba  (1974))  in  the  corresponding 
integrals  to  obtain 


(6.12) 


jj,  ^2  2 

(l/a)  / | x - / | (1/a  \r2ii)exp{-(r  - p ) /2ct  }dx  < 
v ^ + lp-pl/(r  + (p-p)  /(a  n/2it)  , 

where  v J is  the  first  absolute  moment  of  a standard  normal  random 

variable.  The  same  result  is  true  if  p > p-  In  general,  since  Ep  = p, 

2 3 2 — 

f E |7^(x)  - f^x)  |dx  = (1/<t)E{v  ’|p-pl  + (p  - p)  /o  + |p-pl  /(tr  } < 


-1/2 


(1/o){v\j  n x/  + (a^/a)n  * + (o^/o^  *J2n)n  ' } - 0(n  7 ) , 

lx  X X 

with  oZ  = poZ  + (1  - p)\o2.  A similar  argument  can  be  used  to  prove 

A =G(n-1/2). 
n 

Now  in  D we  have,  by  Fubini's  Theorem, 
n 

J e|?1(x)(p  - p)  Idx  = E{  |p  - p|  / fj(x)dx)  = E|p  - p|  <E|p  - p' I , (6.14) 


3/2 


-1/2* 


(6.13) 


where 


- 2 2 2 
p'  = (m  - X - \o  )/(l  - \)o 


(6.15) 


It  can  be  shown  (de  Alba  (1974))  that  Elp  - p'  I < 0(n  1/2)  and  so 

D = 0(n_1/).  Here  also,  a similar  argument  can  be  used  to  prove 
n 

-1/2, 


C = 0(n 

n 


').  Thus 
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I 


I 


f e|a  (x)  - A (x)  |dx  = 0(n'1/2)  . (6.16) 

J n Cj 

This  completes  the  proof  that  the  second  term  in  (6.  5)  is  0(n  ^2). 

We  shall  now  find  the  rate  of  convergence  of  the  first  term.  Using 
Pj  p(Xj(x))  and  ^ = p(Xj(x))  along  with  the  facts 

If.  *(x,  jl  )|  = ll/a^l  and  |f  *(x)  | < 1 1/d 's/IiT  | , X*  = 1,X 
\ 1 \ 


we  get 

|A(1)(x)  - A (x)|d  < 1 1/ a |d  I (1  - p.)Uan,\)  - p,L(a  , \ ) - (1  - p)L(a  X)  + 
n n 1 u l i u 

pL(aj,X)  Id  = |l/aN/27|d  |L(a0,X)  + = AQlp1-p|d, 

where  is  defined  in  an  obvious  manner.  Then 

E I A^(x)  - A (x)  |d  < A E |p  - p |d  < A E |pj  - p'  |d  • (6.17) 

n n u i u l 

Repeated  use  of  the  ^-inequality  gives 

1(1  -x)a2|dE|p'  - p'|d  <((n  - l)/n2)d(x2d  + EX^d)  + 

2dn"2d(|x|d  Yj  Elx.  |d+  Yj  EIX  X.  |d).  (6.18) 

i = 2 1 i = 2 


Let 

= e|x.  |d  < «,  i = 1,  ...,n  . (6.19) 

Substitution  of  (6.19)  in  (6.18)  and  using  the  Schwarz  inequality,  together 
with  (6.17),  yields 

E|A^(x)  - A (x)  |d  < A U2(l  - \)  | d{[(n  - l/n2)d(x2d  + v'  ) + 
n n u za 

2dn"2d(  |x|d(n  - l)v^  + (n  - Dv^]}  . (6.  20) 
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Further,  if  d > 1/2, 

/ UG(x)|1  dE|A^(x)  - An(x)  |ddx  < AJ)(T2d  + v'2d>{(n  " D/n}dn 

A'2d(Tdv'  + v'2d){(n  - l)/n}n1_2d  = 0(n'(2d_1))  , 

where  A'  is  a constant  and  t = f |x|a£  ...  ,,(x)dx  < 00 . Substitution 

0 a J \/(i-d) 

of  (6.16)  and  (6.  21)  in  (6.  5)  yields 

Mtj^G)  - r(G)  = 0(n'1/2)  . 

From  Lemma  2 the  proof  is  complete. 


(6.21) 


Q.  E.  D. 


7 . Concluding  Remarks. 

In  this  paper  we  have  presented  a first  approach  via  e.  B.  methods 
to  the  problem  of  detecting  outliers.  Other  alternatives  are  surely 
possible.  A first  one  would  be  to  use  a different  model,  perhaps  consider- 
ing as  outliers  those  observations  which  have  a shifted  mean  rather  than 
a larger  variance.  An  interesting  problem  would  be  to  determine  a 
criterion  to  estimate  the  value  of  the  increase  in  variance  (\)  for  the 
spurious  observations.  Other  articles  on  some  of  these  extensions  are 
being  prepared. 
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